Search CORE

17 research outputs found

RFaaS: RDMA-Enabled FaaS Platform for Serverless High-Performance Computing

Author: Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Taranov Konstantin
Publication venue
Publication date: 25/06/2021
Field of study

The rigid MPI programming model and batch scheduling dominate high-performance computing. While clouds brought new levels of elasticity into the world of computing, supercomputers still suffer from low resource utilization rates. To enhance supercomputing clusters with the benefits of serverless computing, a modern cloud programming paradigm for pay-as-you-go execution of stateless functions, we present rFaaS, the first RDMA-aware Function-as-a-Service (FaaS) platform. With hot invocations and decentralized function placement, we overcome the major performance limitations of FaaS systems and provide low-latency remote invocations in multi-tenant environments. We evaluate the new serverless system through a series of microbenchmarks and show that remote functions execute with negligible performance overheads. We demonstrate how serverless computing can bring elastic resource management into MPI-based high-performance applications. Overall, our results show that MPI applications can benefit from modern cloud programming paradigms to guarantee high performance at lower resource costs

arXiv.org e-Print Archive

Work-stealing prefix scan: Addressing load imbalance in large-scale image registration

Author: Berkels Benjamin
Bientinesi Paolo
Copik Marcin
Grosser Tobias
Hoefler Torsten
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2022
Field of study

Parallelism patterns (e.g., map or reduce) have proven to be effective tools for parallelizing high-performance applications. In this article, we study the recursive registration of a series of electron microscopy images - a time consuming and imbalanced computation necessary for nano-scale microscopy analysis. We show that by translating the image registration into a specific instance of the prefix scan, we can convert this seemingly sequential problem into a parallel computation that scales to over thousand of cores. We analyze a variety of scan algorithms that behave similarly for common low-compute operators and propose a novel work-stealing procedure for a hierarchical prefix scan. Our evaluation shows that by identifying a suitable and well-optimized prefix scan algorithm, we reduce time-to-solution on a series of 4,096 images spanning ten seconds of microscopy acquisition from over 10 hours to less than 3 minutes (using 1024 Intel Haswell cores), enabling derivation of material properties at nanoscale for long microscopy image series.ISSN:1045-9219ISSN:1558-2183ISSN:2161-988

Repository for Publications and Research Data

Edinburgh Research Explorer

Publikationsserver der RWTH Aachen University

Performance-Detective: Automatic Deduction of Cheap and Accurate Performance Models

Author: Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Koziolek Anne
Reiter Andreas
Schmid Larissa
Selzer Michael
Werle Dominik
Publication venue: Association for Computing Machinery
Publication date: 20/05/2022
Field of study

The many configuration options of modern applications make it difficult for users to select a performance-optimal configuration. Performance models help users in understanding system performance and choosing a fast configuration. Existing performance modeling approaches for applications and configurable systems either require a full-factorial experiment design or a sampling design based on heuristics. This results in high costs for achieving accurate models. Furthermore, they require repeated execution of experiments to account for measurement noise. We propose Performance-Detective, a novel code analysis tool that deduces insights on the interactions of program parameters. We use the insights to derive the smallest necessary experiment design and avoiding repetitions of measurements when possible, significantly lowering the cost of performance modeling. We evaluate Performance-Detective using two case studies where we reduce the number of measurements from up to 3125 to only 25, decreasing cost to only 2.9% of the previously needed core hours, while maintaining accuracy of the resulting model with 91.5% compared to 93.8% using all 3125 measurements

Repository for Publications and Research Data

KITopen

GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

Author: Balla Adrian
Beranek Jakub
Besta Maciej
Copik Marcin
Gianinazzi Lukas
Hoefler Torsten
Holenstein Tobias
Janda Kacper
Kalvoda Pavel
Konieczny Marek
Kwasniewski Grzegorz
Leisinger Sebastian
Lindenberger Philipp
Mutlu Onur
Ozdemir Esref
Schaffner Yannick
Schwarz Leonardo
Tatkowski Peter
Vonarburg-Shmaria Zur
Publication venue
Publication date: 05/03/2021
Field of study

We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates developing complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds

arXiv.org e-Print Archive

Repository for Publications and Research Data

FaasKeeper: a Blueprint for Serverless Services

Author: Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Taranov Konstantin
Publication venue
Publication date: 28/03/2022
Field of study

FaaS (Function-as-a-Service) brought a fundamental shift into cloud computing: (persistent) virtual machines have been replaced with dynamically allocated resources, trading locality and statefulness for a pay-as-you-go model more suitable for varying and infrequent workloads. However, adapting services to function within the serverless paradigm while still fulfilling requirements is challenging. In this work, we introduce a design blueprint for creating complex serverless services and contribute a set of requirements for efficient and scalable FaaS computing. To showcase our approach, we focus on ZooKeeper, a centralized coordination service that offers a safe and wait-free consensus mechanism but requires a persistent allocation of computing resources that does not offer the flexibility needed to handle variable workloads. We design FaaSKeeper, the first coordination service built on serverless functions and cloud-native services. FaaSKeeper provides the same consistency guarantees and interface as ZooKeeper with a price model proportional to the activity in the system. In addition, we define synchronization primitives to extend the capabilities of scalable cloud storage ser- vices with consensus semantics needed for strong data consistency

arXiv.org e-Print Archive

FMI: Fast and Cheap Message Passing for Serverless Functions

Author: Böhringer Roman
Calotoiu Alexandru
Copik Marcin
Hoefler Torsten
Publication venue: Association for Computing Machinery
Publication date: 21/06/2023
Field of study

Serverless functions provide elastic scaling and a fine-grained billing model, making Function-as-a-Service (FaaS) an attractive programming model. However, for distributed jobs that benefit from large-scale and dynamic parallelism, the lack of fast and cheap communication is a major limitation. Individual functions cannot communicate directly, group operations do not exist, and users resort to manual implementations of storage-based communication. This results in communication times multiple orders of magnitude slower than those found in HPC systems. We overcome this limitation and present the FaaS Message Interface (FMI). FMI is an easy-to-use, high-performance framework for general-purpose point-to-point and collective communication in FaaS applications. We support different communication channels and offer a model-driven channel selection according to performance and cost expectations. We model the interface after MPI and show that message passing can be integrated into serverless applications with minor changes, providing portable communication closer to that offered by high-performance systems. In our experiments, FMI can speed up communication for a distributed machine learning FaaS application by up to 162x, while simultaneously reducing cost by up to 397 times

Repository for Publications and Research Data

Work-stealing prefix scan : Addressing load imbalance in large-scale image registration

Author: Berkels Benjamin
Bientinesi Paolo
Copik Marcin
Grosser Tobias
Hoefler Torsten
Publication venue
Publication date: 01/01/2020
Field of study

Publikationsserver der RWTH Aachen University

Methods for abdominal respiratory motion tracking

Author: Adam Karwan
Dominik Spinczyk
Marcin Copik
Spinczyk D
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref